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NUMBER FIFTY-TWO 


The retiring editorial staff nursed from cradle through publication ten Monthlys 
per year for the last five years, also two Slaught papers. We hope most have been 
satisfactory, some excellent. Our debt to our authors and referees is great, also to 
our readers for their steady support and encouragement. 

I personally am grateful to all of my associate and collaborating editors. They 
worked hard and competently to a high professional standard. They join me in 
wishing our successors well. 

Harley Flanders 


PRIME NUMBERS AND BROWNIAN MOTION 

PATRICK BILLINGSLEY, The University of Chicago 

Because it factors into a product of prime numbers, each integer contains within 
it a kind of Brownian motion path, and the mathematics of Brownian motion can 
be used to derive theorems about the factorization. Despite the persistent notion that 
a result stated in probability language is rather less true than it might otherwise be, 
I shall state these theorems in probability language and even give them probabilistic 
proofs. As a matter of fact, there will be little in the way of real proofs, since for the 
most part I shall only illustrate general results by examples and special cases. For 
this there is the authority of William Feller, who used to tell us, his students, that the 
best in mathematics, as in art, letters, and all else — that the best consists of the 
general embodied in the concrete. Although at first I thought that was simply an 
antimilitary sentiment, I did eventually understand it as the intellectual-esthetic 
principle he intended and have tried ever since to keep it at the front of my mind. 

The paper has three sections. In Section 1, the mathematical model for a particle 
in Brownian motion is defined and some of its properties described. Section 2, which 
provides the link between Brownian motion and primes, concerns random walk: one 
successively tosses a coin and successively moves along a scale, one unit in the positive 
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Convergence of Probability Measures, 1968, Elements of Statistical Inference (with D. L. Huntsberger) , 
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or negative direction, according as the coin falls heads or tails. Here it is shown how 
a distant random walk looks approximately like a Brownian motion and how the 
Brownian motion model therefore leads to limit theorems associated with random 
walk. Section 3 discusses the random walk which a randomly chosen integer generates 
through its prime factorization: one successively examines the primes, 2, 3, 5, •••, and 
successively moves along a scale, one unit in the positive or negative direction, ac- 
cording as the prime appears in the factorization or not. It turns out that, because of 
the arithmetic fact that distinct primes individually divide an integer if and only if 
their product does, this factorization random walk has many of the properties of the 
ordinary coin-tossing random walk; in particular, it too can be approximated by 
Brownian motion, and it is shown how this leads to limit theorems associated with 
factorization into primes. 

In addition to the elements of real analysis, the paper makes use of statistical 
concepts such as mean, variance, independence, and Gaussian distribution. 

1. Brownian motion. Imagine suspended in a fluid a particle bombarded by 
molecules in thermal motion. The particle will perform that irregular and seemingly 
random movement first described by the biologist Robert Brown in 1828. Since we 
shall be concerned with just one component of this motion, imagine it projected on a 
vertical axis : At each instant t of time we note the height x(t) of the particle above a 
fixed horizontal plane. Over T units of time, the motion of the particle, which we 
take to start at 0, is described by the positions x(t ) for 0 ^ t ^ T — that is, by a 
continuous real function x on [0, T] with x(0) = 0. This leads us to consider the 
collection C o [0, T] of such functions x. 


Position 



' For technical reasons, we make C o [0, T ] into a metric space by taking the distance 
between two of its elements to be the maximum vertical distance between their 
graphs. This topology, the uniform topology, is of little direct concern here; it is 
brought in mostly as evidence that the discussion to follow does have a rigorous 
basis. 

The random motion of the particle is described by an assignment of probabilities 
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P t (A) to subsets A of C o [0, T] ; P T (A ) represents the chance that the path traced out 
by the particle lies in A, or is described by a function x that lies in A. Probabilities 
represent long-run relative frequencies. If the total on a pair of dice is observed, the 
possible outcomes are 2, 3, •••, 12. If many pairs of balanced dice are rolled inde- 
pendently, the proportion among them producing the outcome 7 will be about 
1 /6. If a particle in Brownian motion is observed for T units of time, the possible 
outcomes are the various elements of C o [0, T]. If many independently moving 
particles are observed, the proportion among them producing paths that lie in A will 
be about P T (A). Although the interpretation of probability involves such multiple 
observations, in the mathematical theory we speak of a single roll of the dice, the 
probability the roll produces a 7 being 1/6; in the same way, we speak of a single 
particle, the probability it traces out a path that lies in A being P T (A). 


Figure 2 


The set [x: a ^ x(t) g /?], consisting of the paths that go through the gate in 
Figure 2, represents the event that at time t the particle will lie between a and >9; it is 
assigned probability 

(1) P T \_x : a :g x(t ) ^ /?] = — f e~ u2!2t dt. 

yj2nt 

Thus the distribution of the position at time t follows the Gaussian curve with mean 0 
and variance t. That the mean is 0 reflects the fact that the particle is as likely to go 
up as to go down; there is no drift. The variance t grows linearly; this indicates that 
the particle tends to wander away from its starting point and, having done so, 
suffers no force tending to restore it to that starting point. The equation (1) can be 
extended : the increment over [s, t\ has a Gaussian distribution with mean 0 and 
variance t — s. 

The other important property of Brownian motion is this: Suppose s < s' < t < t', 
and consider for example the event A = [x: x(s') — x(s) ^ 3] that the particle 
undergoes an upward displacement of at least 3 units during the time interval [s, s'], 
together with the event B = [x: x(t') — x(t) < 0] that the particle undergoes a 
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Figure 3 



downward displacement during the time interval [t, f ']. The top path in Figure 3 lies 
in A but not in B , and the bottom path lies both in A and in B. The probabilities of 
A and B and of their intersection infiare related by 

(2) P T {A HB) = PM)Pt(B). 

Thus A and B satisfy the definition of independence ; that is, that the displacement the 
particle undergoes during [s,s'j in no way influences the displacement it undergoes 
during [t, *']. This implies a kind of lack of memory. Although the future behavior 
of the particle depends on its present position, it does not depend on how the particle 
got there. Equation (2) has a more general form showing that the increments over any 
number of disjoint intervals are statistically independent of one another. 

The equations (1) and (2), together with generalized versions of them, determine 
all the probabilities P T (A). (This ignores a technical point: P T (A) cannot be defined 
for every subset A of C o [0, T], but it can for every Borel set ^4 — that is, for every A 
in the a-field generated by the sets open in the uniform topology.) It was one of 
Norbert Wiener’s achievements to prove in 1923 that there does exist an assignment 
of probabilities satisfying these rules, and P T (the corresponding measure on the 
Borel sets) is accordingly called Wiener measure. Here we shall take its existence for 
granted. 

Brownian motion, as described by Wiener measure, obeys a transformation law 
having consequences strange and deep. Suppose that a particle performs a Brownian 
motion for T units of time, and suppose that, in the function representing its path, 
we contract the time scale by the factor T and the position scale by the factor ^J~T. 
According to the law in question, the new path will be exactly like that of a particle 
that has been in Brownian motion for 1 unit of time. 

To understand why, let * and y be the old and new paths, so that x lies in 
C o [0, T], y lies in C o [0, 1], and 


y(t) = —AtT), 

JT 


g 1. 


( 3 ) 
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Of course (3) defines a mapping 

(4) Co[0,T]-C o [0,l]. 

The transformation law says that, if x is a random path in C o [0, T] distributed 
according to P T? then y is a random path in C o [0, 1] distributed according to P 1 . 
(Technically, if </> T is the mapping (4), then P t = *•) Now, according to (1), 

the quantity x(tT) is a Gaussian random variable with mean 0 and variance tT. 
Multiplying a Gaussian random variable by a constant a multiplies its mean by a 
and its variance by a 2 , and the new variable is also Gaussian. Therefore the dis- 
tribution of y(t) as defined by (3) follows the Gaussian curve with mean 0 and 
variance t (since -0 = 0, (T~^) 2 * tT = t ), the first requirement for Brownian 
motion. Contracting time by the factor T leads from a path over [0, T] to a path 
over [0, 1], and the vertical rescaling by 1 / yjT makes the variances work out right. 
Moreover, x has (over disjoint intervals) independent increments, and it is intuitively 
clear that monotone changes of the time and position scales cannot convert in- 
dependent increments into dependent ones. So the transformation (3) must preserve 
the other property of Brownian motion, that of independent increments. This 
argument, which makes the transformation law plausible, can be converted into a 
complete proof. 

By means of the transformation defined by (3), it is possible to see that, whatever 
positive values e and K may have, a Brownian path over [0, 1] will with probability 
exceeding 1 — e have somewhere a chord with slope exceeding K. The trick is this: 
We want, over [0, 1], a Brownian path y with a steep chord. We obtain it not directly, 
but by applying the transformation (3) to a Brownian path x over [0, T] with T 
suitably chosen. Choose T so large that x will, with probability exceeding 1 — e, 
have somewhere a chord with slope exceeding, say, 1. Such a T exists because even 
the most miraculous event will happen in the long run (the monkeys at the type- 
writers), and the occurrence of a chord with slope exceeding 1 is a modest miracle 
indeed. At the same time, choose T to exceed K 2 . If x has a chord with slope ex- 
ceeding 1, and if x and y are related by (3), then y has a chord with slope exceeding 
y/T, which in turn exceeds K. 

Since e may be taken arbitrarily small and K arbitrarily large, a Brownian path 
over [0, 1] must with probability 1 have chords with arbitrarily great slope. There 
must also be chords with arbitrarily large negative slope, and in fact, chords (very 
short ones) with extreme slopes are dense along the path. In rigorous and more 
elaborate form, these arguments show that, if A is the set of paths in C o [0, 1] of 
unbounded variation, [then P^A) = 1. A path of unbounded variation represents the 
motion of a particle that in its wanderings back and forth travels an infinite distance, 
and at this point physicists lose interest because of their obsession with reality. The 
fact is mathematically interesting, however, and so is the fact thatjP^A) = 1 if A is 
the set of functions in C 0 [ 0, 1] that are nowhere differentiable. Constructing a 
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continuous, nowhere differentiable function is difficult, but drawing an element from 
C o [0, 1] randomly according to P t produces such a function with probability 1. 

In what follows we shall be mainly concerned with sets that correspond more 
closely with reality. Although Sections 2 and 3 will involve the transformation (3) 
and T’s that exceed 1, for the rest of this section we shall take T = 1. We shall need 
(1) for the case T = 1 = 1: 

1 f -5 

(5) P 1 [x:a^x(l)^/1]= - <f" 2/2 du. 

y/2n J* 

Suppose a ^ 0 and consider the event [x: ma xx(t) ^ a] that the particle achieves 
the height a at some time t with 0 ^ t ^ 1. First, 

P x [x: maxx(0 ^ a] = P x [x: maxx(0 ^ a and x(l) ^ a] 

+ Pj[x: max x(t ) ^ a and x(l) < a]. 

The two probabilities on the right here can be proved equal, roughly because once the 
particle achieves the height a it is as likely, in the absence of drift, to wander upward 
and finish above a at time 1 as to wander downward and finish below a. Thus 

Pj[x: maxx(0 ^ a] = 2P 1 [x: maxx(f) ^ a and x(l) ^ a]. 

Since the condition max x(t ) ^ a is superfluous in the presence of the condition 
x(l) ^ a, the right side here is 2P 1 [x: x(l) ^ a], and (5) with a ^ 0 and ft = oo now 
implies 

O foo 

(6) Pj [ jc : maxx(0 ^ a] = -- e~ u/2 du. 

-yjln J a 

Thus we have the distribution of the greatest positive excursion. 


Figure 4 



w;; 

[t: x(t) >0] 


Although to make it rigorous requires some effort, this derivation of (6) has an 
intuitive appeal. The next result will be stated without any proof, and like many 
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ex cathedra assertions, it runs counter to intuition. Consider the set \t: x(t ) > 0] of 
time points t, 0 :g t ^ 1, for which the particle is above 0. This set is a union of 
intervals (infinitely many, contrary to Figure 4). Denote by bars the Lebesgue measure 
of this set, the sum of the lengths of the constituent intervals: | [*: x(t) > 0] |. The 
distribution of this quantity, the total time spent above 0, is given by 

(7) P 1 [x:ag|[t:x(Q>0]|gj3]=4 [' ~ = = 

* J« Xi-ti) 

for 0 ^ a ^ P ^ 1. This is Paul Levy’s arc sine law, so called because carrying out 
the integration leads to the arc sine function. 


Figure 5 


0 a £ | 

Figure 5 shows the shape of the density, the area of the shaded region representing 
the right side of (7). The curve is (7-shaped, so that if the length ft — a of the interval 
is fixed, the probability of a ^ | [f: x(t) > 0] | ^ jS grows as the interval nears 0 or 1, 
being smallest when the interval is centered on This is odd because the time spent 
above 0 has mean \ by symmetry, and ordinarily values near the mean of a random 
quantity are more likely to occur than are values far removed from the mean, whereas 
here the situation is just the opposite. 

For general accounts of Brownian motion, see [4] and [7]. 

2. Random walk. Imagine a particle moving about at random on the nodes of a 
cubic lattice. The particle can move in any of six directions (north, south, east, 
west, up, down) to an adjacent node. The direction is determined by the roll of a 
balanced die, the particle moves to the next node, and the die is rolled once more 
to determine the direction of the next move, and so on. Figure 6 shows five steps of 



Figure 6 



such a random walk, together with one of the cells of the cubic lattice. The figure is in 
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the spirit of a venerable vector analysis book which began a proof of Gauss’s theorem 
by enjoining the reader to consider “an infinitesimal element of volume of dimensions 
dx , dy , and dz.” This injunction was accompanied by a nicely labelled diagram 
like Figure 7, which was said to show such an infinitesimal element of volume “much 
enlarged.” Well, Figure 6 is much enlarged too, and if the cubes of the lattice are 
really very small and the particle moves very rapidly from node to node it is natural 
to expect the motion to approximate Brownian motion. 


Figure 7 



i 

7 

> 

dz 

dx 



V 


We shall explore a one-dimensional version of this idea. Consider a vertical axis 
with the integer points 0, +1, + 2, ••• marked off on it. We start at 0, toss a coin, 
and move upward one unit if the coin falls heads and downward one unit if the coin 
falls tails. In the new position ( + 1 or — 1), we toss the coin again and move up or 
down one unit according as it falls heads or tails, and we continue this way for T 
steps, T being here an integer. If we take one unit of time to execute each step of this 
random walk and proceed at a uniform rate from one node to the next, our progress 
is described by a function like that in Figure 8, a polygonal path whose height over i 
is the position at i — that is, the position after the ith step. Of the 2 T such paths, each 
has probability 2 ~ T . (Various aspects of random walk are discussed in [3].) 



The path can also be viewed as describing the fluctuations in a gambler’s fortune. 
The position on the vertical axis represents the gambler’s fortune (relative to his 
initial capital, so that he starts conventionally at 0), and it moves up or down one 
unit — say one pound — according as he wins or loses the next play. 

The random walk path has some of the properties of a Brownian motion path 
over [0, T]. In the first place, for integers with i < i' <j <j', the displacements 
undergone over the time intervals [/, i'] and [j,j'] are independent because they 
depend on disjoint sets of tosses and the tosses are assumed independent (the coin 
has no memory). Thus the path has essentially independent increments (for intervals 
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with nonintegral endpoints the increments can be slightly dependent). The distance 
moved in one step has mean 

(8) (+l)i + (-l)i = 0 

and variance 

(9) (+l) 2 i + (-l) 2 i = l> 

and so the position at i has mean 0 and by independence has variance i , another 
property of Brownian motion (see equation (1)). (For nonintegral t, the position at t 
has mean 0, but the variance is only approximately t .) Although the polygonal 


Figure 9 



character of of the path is not shared by Brownian motion, contraction of the two 
scales will make the straight-line segments in Figure 8 disappear in the limit as T-> oo. 

Suppose we contract the time scale by a factor T and the vertical scale by a factor 
y/T, applying the transformation (3) to pass from Figure 8 to Figure 9. In Figure 8 
the segments have length y/2, whereas in Figure 9 they are very short for large T, 
having length of the order 1 I y/T. If Figure 8 represented a Brownian motion path 
over [0, T], then, as explained in Section 1, Figure 9 would represent a Brownian 
motion path over [0, 1]. The transformation (3) leaves invariant those characteristics 
(means, variances, independence of increments) the original path shares with Brownian 
motion and tends to mask those characteristics (piecewise linearity) it does not 
share. Thus we can hope that the curve in Figure 9 will be very like a Brownian 
motion path for large T. And indeed, it is true that 

(10)' Prob [path eT] -> P^A) (T-> oo) 

for subsets A of the space C o [0, 1], where P^A) is Wiener measure. There are 2 r 
paths like the one in Figure 9, and Prob [path e A~\ is 2~ T times the number of them 
that lie in A. 

For an illustration of this theorem, suppose the A in (10) is the set [x: a^x(l)^/?] 
of paths in C o [0, 1] that over the point t = 1 have a height between a and /?. Since 
the height over t = 1 in Figure 9 is 1 I y/T times the position at T in the original 
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random walk, (10) and (5) together imply 


(11) 

Prob f« S P° si,io " at T - £ f 

1- 1 f 


L VT 

J yj 2n J* 


This is the classical DeMoivre-Laplace central limit theorem for Bernoulli trials. It 
describes the position after a large number of steps in a random walk, or the gambler’s 
fortune at the end of an evening’s play of T ventures. If — a = p = .9, the limit in 
(11) is about .6. If T = 100, the gambler thus has probability approximately .6 of 
ending the evening within .9 x ^/100 = 9 pounds of his initial capital. 

Suppose now that A is the set in (6), the set of paths in C o [0, 1] having somewhere 
a height at least a (here a ^ 0). The path in Figure 9 lies in A if at some time during the 
evening’s play the gambler’s fortune is at least a y/T pounds above his initial capital, 
and by (10), the probability of this converges to the right side of (6). For a = 1.7, the 
value of this limit is about .1. With T = 100, this gives an approximate probability of 
.1 that the gambler will have been at least 1.7 x ^/lOO = 17 pounds ahead at the 
time he should have quit. 

Finally, suppose A is the set [x: a ^ | [t : x(t) > 0] | g /?] in (7). During the 
evening the gambler is ahead a certain fraction of the time; if the curve in Figure 9 
represents the history of his fortunes, it belongs to the set A if and only if this fraction 
lies between a and /?. The chance of this event is by (10) and (7) about equal to the 
area of the shaded region in Figure 5. If we compute the areas, the chance the gambler 
is ahead between 45 % and 55 % of the time turns out to be only about .06, whereas 
the chance he is ahead more than 90 % of the time is about .2. In one evening in five 
the gambler will thus be ahead more than 90 % of that evening’s play. By symmetry, in 
one evening in five the gambler will be ahead less than 10% of that evening’s play. 
To convince him in the first [second] case that his experience is due merely to chance 
and not to his being Fortune’s favorite [Fortune’s fool] will be difficult [impossible]. 

We have applied (10) to three interesting sets A. If A is the set of functions in 
C o [0, 1] of unbounded variation, then P^A) = 1, as explained in Section 1, while 
Prob [path e A~\ = 0 because the curve in Figure 9 is visibly of bounded variation. 
Thus (10) fails for certain subsets A of C 0 [0, 1]. The mathematical fact is that (10) 
holds for every set (Borel set) A whose boundary dA (boundary in the sense of the 
uniform topology) satisfies P 1 (dA) — 0 — a condition which holds in our three ap- 
plications but not if A is the set of functions of unbounded variation. A complete 
proof of this theorem uses a combination of probability theory and functional 
analysis; the details can be found in [1]. 

3. Prime divisors. According to the fundamental theorem of arithmetic, each 
integer has a factorization into primes, a factorization unique except for order (see 

[5], for example). Let f(n) be the number of distinct primes in the factorization of n; 
we do not count multiplicity: /( 3 3 4 -5 2 ) is 2, not 6. The table shows some values of the 
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n 

2 

3 

4 

5 

6 

7 

•• 29 30 31 

•• 209 

210 

211 ••• 

m 

1 

1 

1 

1 

2 

1 

•13 1 * 

2 

4 

1 ••• 


function /. It rises slowly. The smallest n’s with respective /-values 2, 3, and 4 are 
2-3 = 6, 2 • 3 • 5 = 30, and 2 • 3 • 5 • 7 = 210. The fact that there are infinitely 
many primes implies, however, that / assumes arbitrarily large values; since /(p) = 1 
for prime p, the same fact implies that / infinitely often drops back to 1. 

Since/ varies in this irregular fashion, it is natural to ask after its average behavior. 
For example, it can be shown that 

(12) ~ i /(»)«loglogtf 

™ n — 1 

(see the remarks following (17) below). Since log log 10 70 « 5, the typical integer 
under 10 70 has a mere five prime divisors. More delicate questions concern the 
distribution of /. If S is a set of positive integers, let P N (S) be the fraction among 
the integers 1, 2, that lie in S : 

(13) P N (S) = T X # [n: 1 ^ n g N and ne S]. 

The problem is to get information about quantities like P N [n: a ^ f(n ) ^ 6]. 

Now (13) can be viewed as a probability: We draw an integer at random from the 
range 1 ^ n ^ N, and P N (S) is the probability that it will lie in S. That P N \_n : a ^ /(n) ^ h] 
can be viewed as a probability does not by itself ensure (this may be difficult to credit) 
that probability theory will help in the evaluation. It does in fact help because the 
notion of independence can be brought to bear. If S p (n) is 1 or 0 according as the 
prime p divides n or not, then /(n) = L p <5 p (n). We can understand the distribution 
of /(n) if we understand the joint behavior of the d p (n) as random quantities. 

The number of multiples of p up to N is the integral part [iV/p] of N Ip. The 
probability that S p (n ) = 1, or that p| n, is thus 

< 14 > 

The approximation here is good for large N: since [N/p] differs from N Ip by less 
than 1, the error in (14) is less than 1 /N. The formula (14) reflects the fact that p 
divides every pth integer, and it in no way requires that p be prime. 

Tfie fundamental theorem of arithmetic implies that, if integers a and b are 
relatively prime (share no prime factors), then they individually divide n if and only if 
their product ab divides n. This fact is well illustrated by the use Turing is said to 
have made of it. The sprocket wheel of his bicycle had a faulty tooth and the chain a 
faulty link, and unless he was pedalling very fast when the faulty parts meshed, the 
chain would fall off. So he counted the number, say a, of teeth on the wheel and the 
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number, say b 9 of links on the chain and found, not to his surprise, that a and b were 
relatively prime. Between successive meetings of the bad tooth and link the sprocket 
wheel would in consequence go through b cycles, as the chain went through a cycles. 
Turing is said to have pedalled along counting, on every frth cycle of the sprocket 
wheel giving the burst of speed necessary to carry him past the danger point. 

As a special case of this fact, distinct primes p and q individually divide n if and 
only if pq does. By this and by (14) with pq in place of p, 



Since by (14) the factors 1/p and 1 \q respectively approximate P N [_n: p\n ] and 
P N [tt: q | n\ if N is large, we arrive at 

(15) P N [n:p\n and q\n]x P N [n: p\ n] P w [n: q | n]. 

Thus the events [n : p | n\ and [n : q | n] approximately satisfy the definition of 
independence if n is random, 1 ^ n tk iV, with N large. There is an extension of (15) 
from two primes to three or more. 

We can use this fact to construct a kind of random walk path containing informa- 
tion about the prime factorization of n and in particular about f(n). We draw an 
integer n at random from among 1, 2, •••, N. On a vertical axis with the integer points 
marked olf on it, we start at 0 and go up one unit if 2 1 n and down one unit if 2 J(n. 
From our new position ( + 1 or — 1), we go up one unit if 3 | n and down one unit if 
3 )(n. We proceed in this way, examining each prime in succession. Figure 10 
describes this factorization random walk in the same way that Figure 8 describes the 
coin-tossing random walk. Each number on the time axis is the prime corresponding 
to that step in the random walk. We consider later how long to continue the walk. 


Figure 10 



Since n is random, this path is random. But since the randomness is all in the 
drawing of n before the walk starts, the factorization random walk may seem less 
random than the coin-tossing random walk. This is an illusion. We may imagine 
tossing the coin T times in advance of the walk, recording the sequence of heads and 
tails, and only then performing the corresponding walk. Since we would see its 
whole history on record before setting out, the walk would be very dull. So imagine a 
friend who tosses the coin T times and records the results in advance of the journey, 
and imagine that, rather than show us the record all at once, he instead reveals the 
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outcomes to us one by one as we execute the walk. This restores the suspense. For 
the factorization random walk, we can imagine a friend who draws n at random, 
1 n ^ N, factors n into primes, and at each step of the walk reveals to us whether or 
not the corresponding p divides n. 

The increment of the random path in Figure 10 over an interval depends on how 
many in the corresponding set of primes divide n. Increments over disjoint intervals 
depend on disjoint sets of primes and hence by (15) — or by (15) together with its 
extension to three or more primes — the increments will be approximately independent 
if N is large. Unlike Brownian motion, however, the factorization random walk has 
a strong downward drift. By (14), the chance of going downward at the step cor- 
responding to p is about 1 — 1 /p, which is almost 1 for large p. The remedy is to 
move up a distance 1 — 1 /p if p | n and to move down only a distance 1 Ip if pjfn. 
The expected distance moved is now 

(1 - p~ l )P N [n: p\n ] + (- p~ l )P N [n\ pjfn], 
which by (14) is approximately 


This corresponds with (8), an equation which shows that the coin-tossing random 
walk has no drift. 


Figure 11 



Since the mean distance moved at the step corresponding to p is approximately 0, 
the variance is approximately (1 - p~ i ) 2 ^ N [n: p\n\ + ( — p) 2 Pjv[rc: p^n], which by 
(14) is in turn approximately 





p 


The distance moved thus tends to be very small for large p, in contrast with the 
coin-tossing random walk, which by (9) proceeds with vigor ever undiminished. 
The remedy this time is to spend only an amount of time 1 /p executing the step 
corresponding to p. With these two modifications, the path is as in Figure 11. 

To recapitulate, the time interval corresponding to the prime p has length 1 /p. 
Over this interval, the path rises an amount d p (n) — 1/p; that is, it rises 1 — 1 /p if 
p | n (the probability of this is approximately 1 /p) and it rises 0 — 1 /p if pjfn (the 
probability of this is approximately 1 — 1 /p). 
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The point t in Figure 11 (the right endpoint of the interval corresponding to p) is 
2 q < p 1 lq (summation over primes q not exceeding p ). The distance moved in the 
step corresponding to q has variance about 1 jq 9 and hence by the approximate 
independence of the steps ((15) again), the variance of the position at this time t is 
approximately 1 lq, or t itself. The above adjustment of the factorization random 
walk has thus not only eliminated the drift, it has so adjusted the time scale that the 
variances are about what they are for the coin-tossing random walk and for Brownian 
motion. 

It can be shown that 

(16) X — » log log u 

q^u Q 

for large u (the two expressions go to infinity with u and their difference remains 
bounded; see [5, p. 351]). That the sum in (16), instead of increasing in some erratic 
fashion, is asymptotic to a standard function like log log u is inessential to what 
follows; but the formulas become simpler (and remain valid) if at each occurrence of 
the sum we substitute the right side of (16). 

Thus the t in Figure 11 is essentially log log p and the height 'L q ^ p (S q (n) - 1 lq) 
of the curve over t is approximately 

(17) X 8 q (n) -log log p. 

q^p 

Now n has S q (n) prime divisors that do not exceed p , and we normalize this 
quantity by subtracting away the value log log p it has for a “typical” n. (If n is 
random, 1 <^n N, then X^ p S q (n) has by (14) and (16) a mean of about log log p; 
this is where (12) comes from.) The factorization random walk is a record of these 
differences (17). We continue the walk until each p ^ N has been dealt with, and the 
corresponding point on the time axis is T = 1 /p & log log N. 


Figure 12 

0 


The random path now resembles a coin-tossing path in that the increments are 
almost independent for large N , there is essentially no drift, and the variances are 
about right. As in the coin-tossing case, rescaling will lead in the limit (N -* oo) to 
Brownian motion. To send T to the point 1, we contract the horizontal scale by a 
factor T = log log N , and, again as in the coin-tossing case and for the same reasons, 
we contract the vertical scale by the square root of this, applying the transformation 
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(3). The point t in Figure 1 1 goes to log log p /log log N, and the path is that shown in 
Figure 12. 

Since the path depends on n and N, denote it path^n). Since n is random 
(l^n^ N), so is the path, and the chance that it lies in a given subset A of C o [0, 1] is 
P N [n: path N (n) eA~\. The theorem linking primes with Brownian motion is this: If A 
is a subset (Borel subset) of C o [0, 1] satisfying PfdA) = 0, then 

(18) : path N (n) e A] -> PfA) ( N -> oo), 

where PfA) is Wiener measure. The proof of (18) uses a combination of probability 
theory, functional analysis, and number theory. The theorem is given implicitly in 
[8, p. 122], explicitly in a manuscript version of [1] and in a much more general 
form in [9]. (For general discussions of probability methods in number theory, see 
[6], [8] and the author’s 1973 Wald lectures, to appear in the Annals of Probability .) 
From Figure 12, a plot of the differences (17) normalized to 

(19) (E S q (n) - oglogp)/7loglogJV, 

we can read off arithmetic properties of n, and therefore (18) yields arithmetic limit 
theorems. Consider the three sets A to which we applied the analogous result (10). 
The height of the curve in Figure 12 over the time point 1 is (19) with N in place of p ; 
it is the number f(n) of prime factors of n , normalized to 

(/(n) - log log N ) / 7 log log N. 

The greater this is, the more highly composite n is, and the smaller it is, the more 
“prime-like” n is. With A — [x: a ^ x(l) g /?], it follows by (18) and (5) that 


(20) 

P J„:«< /W - |08l08N < B 



L log log N 

J yj2n J a 


This is the Erdos-Kac central limit theorem for /. (For an elementary direct proof ot 

(20), see [2].) 

For — a = ft = .9, the limit in (20) is about .6, and if N = 10 7 °, so that log 
log JV « 5 , the double inequality in (20) is approximately the same as — .9 g (/(n) 
5^.9, which in turn is approximately the same as 3^/(n)^7. Thus 
something like 60% of the integers under 10 7 ° have from 3 to 7 prime divisors. 

The larger (17) is, the more highly composite n appears to be at that point in the 
factorization; that is, (17) measures the apparent compositeness of n when it has been 
tested for divisibility only by primes up to p. The maximum apparent compositeness 
is measured by 

( 21 ) max ( E <S,(n) — log log pi ; 

prgiV \q<p ] 
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since this is ^/loglogiV times the maximum height of the curve in Figure 12, an 
application of (18) to the set in (6) gives its approximate distribution. The right side 
of (6) being about .1 if a = 1.7, for about 10% of the integers under 10 7 ° does (21) 
exceed 1.7 x ^5^3.8. 

Let us say that n is excessive at p if 

(22) 2 <5,(«) > log log p; 

this holds if, with respect to divisibility by primes up to p , n is “more composite,” or 
“less prime-like,” than the average integer. And (22) holds exactly when the cor- 
responding point on the curve in Figure 12 is above the axis. The polygonal segment 
corresponding to p has length p~ 1 /\og log N when projected on the horizontal axis, 
and so the amount of time the curve spends above 0 is essentially 

(23) uiubr E [j : p s N and ,1 Vn) > loglog 4 

the sum extending over those p at which n is excessive. 

If we test n for divisibility by the primes in succession, spending an amount 
1 Ip of time on p (p ^ N), (23) is the fraction of time we are dealing with a p at which 
n is excessive. From an application of (18) to the set in (7) it follows that for large N 
the distribution of (23) approximately follows the density curve in Figure 5. For 
about 20 % of the integers under N the quantity (23) exceeds .9, for about 20 % it is 
less than .1, and for only about 6% does it lie between .45 and .55. 

Prime factors exhibit in this respect the same strange behavior coins do. In a way 
they are even more strange. A quantity perhaps more natural to consider than (23) is 

(24) x #.[>: p^N and 2 S q (n) > log log p], 

TqA) qtkp 

the number of p for which n is excessive at p, normalized by division-by n(N), the 
total number of primes involved. For N large, of the break points in the polygon in 
Figure 12 the great majority are very near 1, which has the result that in the limit the 
distribution of (24) consists of a mass of \ at 0 and a mass of ^ at 1 : If e>0and N 
exceeds some N e , then (24) is less than e with a probability lying in the range i — e 
and i + e and is greater than 1—8 with a probability lying in the same range. Thus 
practically all integers are excessive either at practically all primes or at practically 
none. 


The 1972 Rouse Ball Lecture, given while the author was a Guggenheim Fellow, visiting Peter- 
house and the Statistical Laboratory of the University of Cambridge. It appeared in somewhat 
different form in Eureka , the Journal of the Archimedeans, the Cambridge University Mathematical 
Society. 
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CORRECTION TO “UNIQUE FACTORIZATION DOMAINS” 

P. M. Cohn, Bedford College, University of London 

The statement “Any Noetherian UFD is a Dedekind domain” (this Monthly, 
80 (1973) 1-18) should be omitted. 

The assertion is of course well known to be false; a correct statement would be: 
A Dedekind domain is a UFD if and only if it is a principal ideal domain. 

I am indebted to Professor J. H. Hays for drawing my attention to this error. 


CORRECTION TO “A HISTORY OF THE PRIME NUMBER THEOREM” 

L. J. Goldstein, University of Maryland 

In my paper, [this Monthly, 80 (June-July, 1973) 599-615] I asserted that the 
sieve of Eratosthenes was known to the ancient Greeks and, in fact, appeared in 
Euclid. It has been pointed out to me by Professor J. Albree that although the sieve 
was known since approximately the time of Euclid, it does not appear in the Elements. 
The author regrets the error. 



